|
A spoken dialog system is a computer system able to converse with a human with voice. It has two essential components that do not exist in a text dialog system: a speech recognizer and a text-to-speech module. In can be further distinguished from command and control speech systems that can respond to requests but do not attempt to maintain continuity over time. ==Components== * An automatic Speech recognizer (ASR) decodes speech into text. Domain-specific recognizers can be configured for language designed for a given application. A "cloud" recognizer will be suitable for domains that do not depend on very specific vocabularies. * Natural language understanding transforms a recognition into a concept structure that can drive system behavior. Some approaches will combine recognition and understanding processing but are thought to be less flexible since interpretation has to be coded into the grammar. * The dialog manager controls turn-by-turn behavior. A simple dialog system may ask the user questions then act on the response. Such directed dialog systems use a tree-like structure for control; frame- (or form-) based systems allow for some user initiative and accommodate different styles of interaction. More sophisticated dialog managers incorporate mechanisms for dealing with misunderstandings and clarification. * The domain reasoner, or more simply the back-end, makes use of a knowledge base to retrieve information and helps formulate system responses. In simple systems, this may be a database which is queried using information collected through the dialog. The domain reasoner, together with the dialog manager, maintain the context of interaction and allows the system to reflect some human conversational abilities (for example using anaphora). * Response generation is similar to text-based natural language generation, but takes into account the needs of spoken communication. This might include the use of simpler grammatical constructions, managing the amount of information in any one output utterance and introducing prosodic markers to help the human participant absorb information more easily. A complete system design will also introduce elements of lexical entrainment, to encourage the human user to favor certain ways of speaking, which in turn can improve recognition performance. * Text-to-speech synthesis (TTS) realizes an intended utterance as speech. Depending on the application, TTS may be based on concatenation of pre-recorded material produced by voice professionals. In more complex applications TTS will use more flexible techniques that accommodate large vocabularies and that allow the developer control over the character ("personality") of the system. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Spoken dialog systems」の詳細全文を読む スポンサード リンク
|